{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "# Ship data analysis example\n",
    "\n",
    "<img align=\"right\" src=\"https://anitagraser.github.io/movingpandas/assets/img/movingpandas.png\">\n",
    "\n",
    "[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/anitagraser/movingpandas-examples/main?filepath=2-analysis-examples/ship-data.ipynb)\n",
    "[![IPYNB](https://img.shields.io/badge/view-ipynb-hotpink)](https://github.com/anitagraser/movingpandas-examples/blob/main/2-analysis-examples/ship-data.ipynb)\n",
    "[![HTML](https://img.shields.io/badge/view-html-green)](https://anitagraser.github.io/movingpandas-website/2-analysis-examples/ship-data.html)\n",
    "\n",
    "This tutorial uses AIS data published by the Danish Maritime Authority. The AIS record sample extracted for this tutorial covers vessel traffic on the 5th July 2017 near Gothenburg.\n",
    "\n",
    "This tutorial covers: \n",
    "1. Trajectory data preprocessing\n",
    "   * Loading movement data from common geospatial file formats \n",
    "   * Exploring spatial & non-spatial data distributions \n",
    "   * Applying filters to extract relevant data\n",
    "   * Converting GeoDataFrames into Trajectories describing continuous tracks of moving objects\n",
    "1. Trajectory analysis\n",
    "   * Visualizing trajectories and their properties\n",
    "   * Filtering trajectories by area of interest\n",
    "   * Splitting continuous tracks into individual trips\n",
    "   * Exploring trip properties including: origins, destinations, and attributes "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd\n",
    "from geopandas import GeoDataFrame, read_file\n",
    "from shapely.geometry import Point, LineString, Polygon\n",
    "from datetime import datetime, timedelta\n",
    "import matplotlib.pyplot as plt\n",
    "import movingpandas as mpd\n",
    "from holoviews import opts, dim\n",
    "\n",
    "import warnings\n",
    "warnings.simplefilter(\"ignore\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "mpd.__version__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Loading sample AIS data \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%time\n",
    "df = read_file('../data/ais.gpkg')\n",
    "print(f\"Finished reading {len(df)}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's see what the data looks like:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.plot()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If we look at the data distributions, we can see that there are a lot of records with speed over ground (SOG) values of zero in this dataframe:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df['SOG'].hist(bins=100, figsize=(15,3))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's get rid of these rows with zero SOG:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(f\"Original size: {len(df)} rows\")\n",
    "df = df[df.SOG>0]\n",
    "print(f\"Reduced to {len(df)} rows after removing 0 speed records\")\n",
    "df['SOG'].hist(bins=100, figsize=(15,3))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's see what kind of ships we have in our dataset:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df['ShipType'].value_counts().plot(kind='bar', figsize=(15,3))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, let's create trajectories:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%time\n",
    "df['t'] = pd.to_datetime(df['Timestamp'], format='%d/%m/%Y %H:%M:%S')\n",
    "traj_collection = mpd.TrajectoryCollection(df, 'MMSI', t='t', min_length=100)\n",
    "print(f\"Finished creating {len(traj_collection)} trajectories\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "traj_collection = mpd.MinTimeDeltaGeneralizer(traj_collection).generalize(tolerance=timedelta(minutes=1))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Plotting trajectories\n",
    "\n",
    "Let's give the most common ship types distinct colors. The remaining ones will be just grey:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "shiptype_to_color = {'Passenger': 'blue', 'HSC': 'green', 'Tanker': 'red', 'Cargo': 'orange'}\n",
    "traj_collection.plot(column='ShipType', column_to_color=shiptype_to_color, linewidth=1, capstyle='round')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "passenger = traj_collection.filter('ShipType', 'Passenger')\n",
    "passenger.hvplot(title='Passenger ferries', line_width=2, frame_width=700, frame_height=500)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Visualizing trajectory properties"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can also plot individual trajectories to better visualize their properties, such as the changes in NavStatus:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "my_traj = traj_collection.trajectories[0]\n",
    "my_traj.df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "my_traj.df.tail()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "my_traj.hvplot(title=f'Trajectory {my_traj.id}', frame_width=700, frame_height=500, line_width=5.0, c='NavStatus', cmap='Dark2') "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Finding ships passing under Älvsborgsbron bridge\n",
    "We can find ships passing under the bridge based on trajectory intersections with the bridge area."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "area_of_interest = Polygon([(11.89935, 57.69270), (11.90161, 57.68902), (11.90334, 57.68967), (11.90104, 57.69354), (11.89935, 57.69270)])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "intersecting = traj_collection.get_intersecting(area_of_interest)\n",
    "print(f\"Found {len(intersecting)} intersections\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "bridge_traj = intersecting.trajectories[0]\n",
    "bridge_traj.hvplot(title=f'Trajectory {bridge_traj.id}', frame_width=700, frame_height=500, line_width=5.0, c='NavStatus', cmap='Dark2') "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "bridge_traj.df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Identifying trip origins and destinations"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Since AIS records with a speed over ground (SOG) value of zero have been removed from the dataset, we can use the `split_by_observation_gap()` function to split the continuous observations into individual trips:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "trips = mpd.ObservationGapSplitter(passenger).split(gap=timedelta(minutes=5))\n",
    "print(f\"Extracted {len(trips)} individual trips from {len(passenger)} continuous vessel tracks\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's plot the resulting trips!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "trips.hvplot(title='Passenger ferry trips', line_width=2, frame_width=700, frame_height=500)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Compared to plotting the original continuous observations, this visualization is much cleaner because there are no artifacts at the border of the area of interest. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, let's get the trip origins:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "origins = trips.get_start_locations()\n",
    "origins.hvplot(title='Trip origins by ship type', c='Name', geo=True, tiles='OSM', frame_width=700, frame_height=500)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In our data sample, trip origins can be:\n",
    "- When a ship departs its anchoring location and the speed changes from 0 to >0\n",
    "- When a ship trajectory first enters the observation area"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "origins.hvplot(title='Origins by speed', c='SOG', geo=True, tiles='OSM', frame_width=700, frame_height=500)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Finding ships that depart from Sjöfartsverket (Maritime Administration)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "trips = mpd.ObservationGapSplitter(traj_collection).split(gap=timedelta(minutes=5))\n",
    "area_of_interest = Polygon([(11.86815, 57.68273), (11.86992, 57.68047), (11.87419, 57.68140), (11.87288, 57.68348), (11.86815, 57.68273)])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can identify vessels that start their trip within a given area of interest by intersecting trip starting locations with our area of interest:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "departures = [traj for traj in trips if traj.get_start_location().intersects(area_of_interest) and traj.get_length() > 100]       \n",
    "print(f\"Found {len(departures)} departures\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "tc = mpd.TrajectoryCollection(departures)\n",
    "tc.hvplot(title=f'Ships departing from Sjöfartsverket', line_width=3, frame_width=700, frame_height=500, hover_cols=['Name']) "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's see what kind of ships depart from here:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "for traj in departures:\n",
    "    print(f\"{traj.df['ShipType'].iloc[0]} vessel '{traj.df['Name'].iloc[0]}' departed at {traj.get_start_time()}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Of course, the same works for arrivals:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "arrivals = [traj for traj in trips if traj.get_end_location().intersects(area_of_interest) and traj.get_length() > 100]\n",
    "print(f\"Found {len(arrivals)} arrivals\")\n",
    "\n",
    "for traj in arrivals:\n",
    "    print(f\"{traj.df['ShipType'].iloc[0]} vessel '{traj.df['Name'].iloc[0]}' arrived at {traj.get_end_time()}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "tc = mpd.TrajectoryCollection(arrivals)\n",
    "tc.hvplot(title=f'Ships arriving in Sjöfartsverket', line_width=3, frame_width=700, frame_height=500, hover_cols=['Name']) "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Clustering origins\n",
    "\n",
    "To run this section, you need to have the scikit-learn package installed. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.cluster import DBSCAN\n",
    "from geopy.distance import great_circle\n",
    "from shapely.geometry import MultiPoint"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "origins = trips.get_start_locations()\n",
    "origins['lat'] = origins.geometry.y\n",
    "origins['lon'] = origins.geometry.x\n",
    "matrix = origins[['lat','lon']].values"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "kms_per_radian = 6371.0088\n",
    "epsilon = 0.1 / kms_per_radian"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "db = DBSCAN(eps=epsilon, min_samples=1, algorithm='ball_tree', metric='haversine').fit(np.radians(matrix))\n",
    "cluster_labels = db.labels_\n",
    "num_clusters = len(set(cluster_labels))\n",
    "clusters = pd.Series([matrix[cluster_labels == n] for n in range(num_clusters)])\n",
    "print(f'Number of clusters: {num_clusters}')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "origins['cluster'] = cluster_labels"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_centermost_point(cluster):\n",
    "    centroid = (MultiPoint(cluster).centroid.x, MultiPoint(cluster).centroid.y)\n",
    "    centermost_point = min(cluster, key=lambda point: great_circle(point, centroid).m)\n",
    "    return Point(tuple(centermost_point)[1], tuple(centermost_point)[0])\n",
    "centermost_points = clusters.map(get_centermost_point)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "origins.hvplot(title='Clustered origins', c='cluster', geo=True, tiles='OSM', cmap='glasbey_dark', frame_width=700, frame_height=500)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "origins_by_cluster = pd.DataFrame(origins).groupby(['cluster'])\n",
    "summary = origins_by_cluster['ShipType'].unique().to_frame(name='types')\n",
    "summary['n'] = origins_by_cluster.size()\n",
    "summary['sog'] = origins_by_cluster['SOG'].mean()\n",
    "summary['geometry'] = centermost_points\n",
    "summary = summary[summary['n']>1].sort_values(by='n', ascending=False)\n",
    "summary.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "cluster_of_interest_id = 28\n",
    "origins[origins['cluster']==cluster_of_interest_id].hvplot(\n",
    "    title=f'Cluster {cluster_of_interest_id}', c='ShipType', geo=True, tiles='OSM', frame_width=700, frame_height=500)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "( trips.hvplot(title='Origin clusters by speed', color='gray', line_width=1, frame_width=700, frame_height=500) *\n",
    "  GeoDataFrame(summary, crs=4326).hvplot(c='sog', size=np.sqrt(dim('n'))*3, geo=True,  cmap='RdYlGn')\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Continue exploring MovingPandas\n",
    "\n",
    "1. [Bird migration analysis](bird-migration.ipynb)\n",
    "1. [Ship data analysis](ship-data.ipynb)\n",
    "1. [Horse collar data exploration](horse-collar.ipynb)\n",
    "1. [OSM traces](osm-traces.ipynb)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
